Using NLP to Efficiently Visualize Text Collections with SOMs

نویسندگان

  • James Henderson
  • Paola Merlo
  • Ivan Petroff
  • Gerold Schneider
چکیده

Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to just the heads of noun and verb phrases does indeed reduce the heavy computational cost without degrading the quality of the map, while more severe reductions which focus on subject and object noun phrases degrade map quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural language processing and information retrieval methods for intellectual property analysis

More intellectual property information is generated now than ever before. The accumulation of intellectual property data, further complicated by this continued increase in production, makes it imperative to develop better methods for archiving and more importantly for accessing this information. Information retrieval (IR) is a standard technique used for efficiently accessing information in suc...

متن کامل

Exploration of Full-text Databases with Self-organizing Maps

Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...

متن کامل

A combined IR/NLP approach to question answering against large text collections

We describe an approach to finding literal answer strings to natural language questions in large text collections. The approach involves linking an IR system with an NLP system that performs reasonably thorough linguistic analysis. The IR system treats the question as a query and returns a set of top ranked documents or passages. The NLP system parses the question and analyses the top ranked do...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Visualization of Pareto Solutions by Spherical Self-Organizing Map and It’s acceleration on a GPU

In this study, we visualize Pareto-optimum solutions derived from multiple-objective optimization using spherical self-organizing maps (SOMs) that lay out SOM data in three dimensions. There have been a wide range of studies involving plane SOMs where Pareto-optimal solutions are mapped to a plane. However, plane SOMs have an issue that similar data differing in a few specific variables are oft...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002